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Abstract 



A recent conjecture of Caputo, Carlen, Lieb, and Loss, and, independently, of the author, states 
that the maximum of the permanent of a matrix whose rows are unit vectors in l p is attained 
either for the identity matrix I or for a constant multiple of the all-1 matrix J. 

The conjecture is known to be true for p = 1 (J) and for p > 2 (J). 

We prove the conjecture for a subinterval of (1,2), and show the conjectured upper bound 
to be true within a sub exponential factor (in the dimension) for all 1 < p < 2. In fact, for p 
bounded away from 1, the conjectured upper bound is true within a constant factor. 

This leads to a mild (subexponential) improvement in deterministic approximation factor 
for the permanent. We present an efficient deterministic algorithm that approximates the 
permanent of a nonnegative n x n matrix within exp {n — O (nj log n)}. 



1 Introduction 

Let A = {a,ij) be an n x n matrix. The permanent of A is defined as 

n 

per (A) = Yl a ia(i) 

<y£Sn i=l 

Here S n is the symmetric group on n elements. 

This paper investigates upper bounds on the permanent of matrices with nonnegative en- 
tries. Bregman (3] resolved the Mine conjecture and proved a tight upper bound on the per- 
manent of a zero-one matrix with given row sums. Here we are interested in upper bounds for 
matrices with general nonnegative entries. (For related work see also ^Zj and the references 
there.) 

More specifically given 1 < p < do, we investigate the maximal possible value U(n,p) of the 
permanent of a matrix whose rows are unit vectors in We give an upper bound on U(n,p) 
which is tight up to a subexponential (in n) multiplicative factor. Since the permanent is a 
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multiliner function of its rows, this leads to an upper bound on the permanent of an arbitrary 
real matrix, given the l p length of its rows. 

Let us start with a conjecture claiming that there are only two possible matrices on which 
the maximum of the permanent can be attained. This conjecture is due to Caputo, Carlen, 
Lieb, and Loss and, independently, to the author. 

Conjecture 1.1: Let 1 < p < oo. The maximum of the permanent of an n x n matrix whose 
rows are unit vectors in L is attained in one of two cases. 

1. On the identity matrix. In this case the permanent is 1. 

2. On a matrix all of whose entries are •nT 1 l' p . In this case the permanent is ^r/j- 
In particular, the maximal possible value of the permanent is 



U(n,p) = max { 1,-^4 (1) 



I 



Here are some preliminary remarks. Let the dimension n be fixed. The function f(p) = 

is increasing. Clearly /(l) < 1 and /(2) > 1. It is easy to compute the unique value of p, lying 

in [1, 2] for which f(p) = 1, that is 



Let I denote the identity matrix, and J the all-1 matrix. The conjecture claims that I is optimal 
for p S [l,p c ] and n~ l / p ■ J is optimal for p G [p c , oo]. 

In fact, it would suffice to prove the conjecture only for p = p c . 



Lemma 1.2: 



• Let po > 1 be such that the matrix I is optimal for po . Then I is the only optimal matrix 
for all 1 < p < po. 

• Let po be such that the matrix n 1 ' p ■ J is optimal for p^. Then ra~ 1 / p • J is the only optimal 
matrix for all p > po . 



Let us now present the known results. 



1. The case p = 1 is trivial. For any n only the identity matrix is optimal, and U(n, 1) = 1. 
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2. The conjecture is also known to be true for p > 2. In this case the optimal matrix is 
n~ 1 / p - J, and U(n,p) = Different proofs of this fact were given in |^JE3EI1- Later 
it was pointed out |5] that this case was, essentially, already dealt with in |IJS|. More 
specifically, the proof of [TUj is a special case of an argument in (Proposition 9.1.1, 
Appendix 1). 

To the best of our knowledge, the first published proof specifically treating this case 
appeared recently in [I]. Furthermore, this paper (independently) states Conjecture 11.11 
attributing it also to P. Caputo. 

Let us also mention that results in [7] imply Conjecture 11.11 for p > n. 

3. The case 1 < p < 2. This case seems to be the most interesting. 
Clearly, one direction in is trivially true: U(n,p) > max |l, 7^77^ |- 
In the other direction, U(n,p) < U(n, 1/2) = 7-^72- 

This upper bound on U(n,p) was improved in [3J. They show the function U(n,p) to 
be logarithmically convex in 1/p. This, together with the known values U(n, 1) = 1 and 
U(n, 2) = 7^72 , lead to an upper bound 



Tl 



I \ 2-2/p 



In this paper we show the conjecture to hold in the interval [l,po] where 

n log n — (n — 1) log(n — 1) 

Po = : 

log n 

For n > 2 holds 1 < po < p c < 2. 

It is interesting to compare po with p c . We have p c < ^(f^^ = 1 + w^-i • And 

logn+(n-l)log^ logn+(n-l)/n i , 1 1 ™ j i U < 

PO = ^ > logn = 1 + IHgH " ^lHgH- ThuS ^ and PO are ° nl y about 

. \ apart. 

log n 1 

The proximity of po and p c , together with log-convexity of U(n,p), already suffice for giving 
an upper bound on U(n,p) for all p E (1, 2) which is tight up to a simply exponential factor (in 
n). The approach we take will lead to a somewhat tighter estimate, which has a sub exponential 
error in the worst case. 

Our main results are given in the following theorem. 
Theorem 1.3: Let n be fixed, and let po = nl °g n ~(^ 1 ^ lo s("^ 1 ) _ 

1. The conjecture is true for 1 < p < p$. The identity matrix is optimal for for 1 < p < po, 
and 

U(n,p) = 1 
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2. For po<p<2 holds 

max I 1 ' ^} - U ^ p) ~ exp ( (p " 1)/p " el/iP ' 1) } " 

Observe, that this bound is exp {n/ log n}-tight in the worst case. For p bounded away 
from 1, this bound is tight within a constant factor. 

1.1 Approximating the permanent 

The original motivation for this study was computational. The goal is to construct an efficient 
deterministic algorithm that approximates the permanent of a given nonnegative matrix within 
a reasonable multiplicative factor. (A randomized algorithm to approximate the permanent 
with arbitrary precision was constructed in 

In this problem was reduced to the case in which the input matrix is doubly stochastic. 
This immeadiately gave an ^—approximation, since the permanent of a doubly stochastic matrix 
lies between ^ and 1. Here the upper bound is trivial, while the lower bound is a deep 
theorem of Egorychev [Sj and Falikman (HJ, proving a conjecture by van der Waerden. In this 
light, it seems natural to look fore more informative upper bounds, which could lead to better 
approximation factors for the doubly-stochastic, and thus, for the general case. 

Our results lead to an improvement of exp{0 (ra/logn)} in the approximation factor. We 
note that a polynomial (in n) improvement in the approximation factor was recently obtained 
in0. ' 

The main tool is a permanental inequality which might be of independent interest. This 
inequality is an immediate consequence of Theorem II .31 

Proposition 1.4: Let n > 2 be an integer. Let po = nl °g n ~(^i) 1 °g(" , ~ 1 ) _ Then for any 
stochastic n x n matrix A = (a^) holds 

Corollary 1.5: There is a deterministic polynomial-time algorithm to approximate the perma- 
nent of a given nonnegative n x n matrix within a multiplicative factor of ^ • e . 

Proof: (Of the corollary) It is sufficient to present an algorithm approximating the permanent 
of a given doubly stochastic matrix within this factor. 

Let qo = 1/pq. Assume n > 5. Let A be a doubly stochastic matrix. Let a E S n be a 
permutation such that YYl=i a icr(i) is maximal. 1 Then there are two cases. 

1 Finding a amounts to finding a maximal weight perfect matching in a given bipartite graph with 2n vertices, 
and can be done efficiently. 
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• nr=i^w > 2_n - Then 

n 

2- n <l[a i(7{t) <Per(A)<l 
i=l 

• niLi a io{i) < 2_n . In this case, by the proposition, 

^ < Per(A) < 2^°-^ ■ Per ((4°)) < 2^°-^ n < e ~ n fe) 

I 

1.2 Generalizations of Mine's conjecture to general nonnegative matrices 

The Mine conjecture, proved by Bregman, states that for a zero-one matrix A with n ones in 
row i, 1 < i < n, 

n 

per(A) < Y\{r t \) 1/n 

i=i 

and equality holds if and only if A is a block-diagonal matrix, and all the blocks are all-1 
matrices. 2 

Let 4> : [0, 1] — > [0, 1] be a function taking 1/r to 1/ (r!) 1 ^, for all integer r. Given a matrix 
A with entries in [0,1], let 4>(A) denote a matrix whose (ij)-th entry is (j){a,ij). Consider a 
stochastic matrix A = (a>ij) whose z-th row has entries with two possible values: r« entries with 
value 1/rj and n — r i entries valued 0. Then the Bregman bound implies 

per(<p(A)) < 1, 

and equality holds iff A is a block-diagonal matrix with blocks which are constant multiples of 
all-1 matrices. 

A natural way to extend (j) to the whole interval [0, 1] is by taking <p{x) = T (1/x + l)™ 1 , for 
all < x < 1, and setting c/>(0) = 0. The following conjecture generalizes the Mine conjecture. 

Conjecture 1.6: For any stochastic matrix A holds 

per((f>(A)) < 1 

and equality holds iff A is a block-diagonal matrix with blocks which are constant multiples of 
all-1 matrices. I 

The function eft = T(l/x + l)~ x is strictly monotone and takes [0,1] onto [0,1]. It is also 
concave |14j . 

Let K = {x € M n ; Y2i=i tyipi) — !}■ This is a convex ball in 1" defining a norm || • \\k- 
Consider the following optimization problem: Choose n unit vectors x^.-.x^ in W 1 endowed 

2 Up to a permutation of rows or columns. 
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with the norm || ■ \\k as rows of a matrix so that the permanent of this matrix is as large as 
possible. 3 

An alternative way to state Conjecture 11.61 is to say that all the optimal solutions to this 
optimization problem are obtained as follows: partition {l...n} into disjoint subsets Si...Sk. 
For each j = l...k choose all the vectors x®, i € Sj, to be equal to n^r • lsj, that is be rgrj on 
the coordinates in Sj, and elsewhere. 

The function (j) and the norm it defines are somewhat compicated to deal with. A natural 
"easier" family of norms to consider as a test case are the l p norms, 1 < p < oo. This, in fact, 
was the starting point of this study. 

We conclude the introduction by stating a conjecture which is a common generalization 
of both Mine's conjecture and Conjecture 11.11 Following the discussion in Lemma 11.21 Con- 
jecture is equivalent to U(n,p c ) = 1. Here p c = "^ n " is the 'critical' value of p for 
n-dimensional matrices. 

Let p c (r) = '^° g r | for integer r. For < r\,r2,--T n < n and 1 < p\,...,p n < oo let 
U(n; n, ■ ■■,r n ; pi, ...,p n ) be the maximum of the permanent of an n x n matrix whose i-th row 
is a unit vector in l Pi supported on at most n non-zero coordinates. Then 

Conjecture 1.7: 

U(n; n,...,r n ; p c (n), ...,p c (r n )) < 1 

I 

It is straightforward to check that for zero-one matrices this conjecture is equivalent to the 
Mine conjecture. For n = = ■■■ = r n = n it reduces to Coniecture ll.il 

We remark that the proof of Theorem 11.31 easily generalizes to give 

U(n; n,...,r n ; po(ri), -,Po(r n )) < 1 

where p (r) = rlo g r -( 1 j-^ 1 ) 1 °g( r - 1 ) . 

A word on our methods and an acknowledgement. Our proof of Theorem II . 31 proceeds along 
the lines of Bregman's proof of the Mine conjecture. A key inequality in that proof has to be 
replaced by a more general inequality of .1 j, quoted as Theorem 12.31 below. We are grateful to 
Leonid Gurvits for directing us to this inequality. 

2 A recursive bound on U(n,p) 

Let 1 < p < oo be fixed. Let q = 1/p. 

A vector y = (yi-.-y n ) £ is stochastic if its coordinates are nonnegative and sum to 1. 
Consider the following function defined on the set A of stochastic vectors: 

n 

3 Replacing permanent with determinant one arrives to questions about the maximal volume subcube of K. 
These questions are of interest in convex geometry The two contexts seem to be very different, however. 
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This is a continuous bounded function which attains its maximum on A. 



Definition 2.1: 

w(n,p) = max ygA P(y) 



The main claim of this section is: 
Theorem 2.2: 



U(n,p) < PJu>(fc,p) 



fc=i 

Proof: The proof is by induction on n. For n = 1, t7(l,p) = = 1. 

Consider an optimization problem 

Maximize Per ^A^- 

Given 

n 

Xij > V? ^ij = i 

i=i 

Clearly the optimal value here is U(n,p). 

A key element of our proof is an inequality of PP, which we state next. 

Theorem 2.3: Let p(x, X) be a nonnegative function defined on a space X x A and let p be a 
nonnegative weight function on X . 

Let P(X) = Ylxex K x )p( x > x )> and Q( A > A ) = X^ex K x )p( x , A ) l °gp( x , A). 

T/ien Q(A, A) > Q( A ; A ) implies P(X) > P(X) unless p(x, A) = p(x, A) for all x with fi(x) > 0. 

Now we apply Theorem 12.31 in our setting. 

Let X = S n be the symmetric group on n elements, and A be the set of all stochastic 
matrices (Xij). Let ^i(a) = 1 for all permutations a € S n and let p(a, A) = Y12=l A i CT (i)' ^ or 

a £ S n and A G A. Then P(A) = K x )p( x , A ) = ^er (A?-) . 

Let A[i,j] be the (n — 1) x (n — 1) matrix obtained from A by deleting z-th row and j-th 
column. Let X q [i, j] be the matrix obtained from X[i, j] by raising each entry to q-th. power. Let 
A = (Xij) with 

_ X^Per(X"[i,j]) 
Per ( A?, 



1-3 J 

The following lemma is a direct consequence of Theorem 12.31 
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Lemma 2.4: 



Per ( Afj j >Per[\% 



Proof: Consider the optimization problem of maximizing Q (A, A) given A. We have 

n n 

Q(A,A) = ]T p(a,X)logp(a,X) = q- £ pfo A) £ log A^) = <? ■ £ X\-Per (X q [i,j]) log Ay 

<J(zS n CJdSn 1=1 *ii=l 

The constraints on A are that it is a stochastic matrix. Therefore we have re independent 
optimization problems of the form: 

Maximize ' s ^Wj\ogyj Given yj>0, yj = 1, 

where are nonnegative constants. Assuming not all Wj are zero, which we may and will do 
in our case, the only solution of this problem is yj = y^ 2 — . This is a simple consequence of 
the concavity of the logarithm. 



Fixing 1 < i < re, and substituting Wj = Xf-Per (X q [i, j]) and yj = Xij, we see that optimal 

Per(\i[i,- 

Now, following |2], we write 



A is given by Ajj = ^' j p"/^'\'^ • The claim of the lemma now follows from Theorem 12.31 | 



Per (A?-) < Per (A?.) = E 11^* 

o"GS n i=l 

n l \1 p C r(\q\; rr(iW\\ q i n 



^ „ . X q ,.,Per(X q [i,a(i)}) . . ^ ^ 



Let (A? ) be an optimal matrix, that is Per ( (A?-)J = U(n,p). Then 



n 

„2 



Consider the matrix A[i, j]. This is an (n — 1) x (re — 1) matrix with row sums r& = 1 — Ajy, 
for k = l...re, k ^ i. Let P be the (re — 1) x (re — 1) diagonal matrix with on the 

diagonal. Then (ay) = P • A [i, j] is a stochastic matrix, and therefore, by induction hypothesis, 
Per(a^)<U(n-l,p). 

This means Per (X q [i, j]) < U(n — l,p) ■ Yl^^l — Xkj) 9 - Substituting this in the inequality 
above, we obtain 



it 



u(n, P r^ < u(n - i )P r E II ( x io) II( 1 - ^ 

<tG5„ 1=1 \ k^i 
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q 2 



u(n-i, P r- ^2 n^w n c 1 -***) 

^r-nc-^-EIl Ti 



The third term in this expression is the permanent of a matrix (°?j)> where = ■ 

A 9 . 

Let rj = X)j=i a ij = Y^j=i (i-a^-)^ ^ e ^ ne row sums of this matrix. Then, Per(a^) < 
U(n,p) • niLi '"f- Substituting in the inequality above gives 

(ra n \ q 

i=ij=i U ijj 
Taking g-th roots of both sides this simplifies to 

n n yq 

U(n,pT < U(n - l,pY \[ (1 - Xij) 9 ■ E 

Let Aj be the z-th row vector of A. Since A is a stochastic matrix, Aj is a stochastic vector. We 
have 

n n yq n 

n (i - • n e 7T3T^ = n ^ 

i,j i=i j=i ^ i=i 

Therefore U(n,p) < U(n — l,p) ■ w(n,p). The claim now follows from the induction hypothesis 

n— 1 n 

U(n,p) <U(n — l,p) ■ w(n,p) < w(n,p) ■ JJ w(k,p) = w(k,p) 

k=l k=l 



3 Proofs of the main results 

Our first order of business is to determine w(k,p), for 1 < k < n. Let 1 < p < 2 be fixed, and 
let q = 1/p. 

Let 0(]fe) = k ■ ( (k ~$ l ~ i y for integer k > 2 and let 0(1) = 1. 

Theorem 3.1: Fix k > 1. T/ie maximum of P(y) = X^=i 2/f rij=£i(l ~ Vj) q ^ s attained either 
at a standard basis vector and then w(k,p) = 0(1) = 1, or at the all-l/k vector, in which case 
w(k,p) = 9(k). 
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The proof of Theorem 13. II is technical and is relegated to Appendix. 

We briefly discuss the claim of the theorem. Let be the k x k identity matrix. Let 
denote the matrix k~ l / p ■ J, where J is the all-1 k x k matrix. Note that 9(k) = per ^ k ^ . 
Therefore the theorem, combined with Theorem 12.21 says that for any k > 2 

U(k,p) ( per (4) per(J k ) 

< max ' 



U(k-l,p) ' {per(I k -i) per(J k ^ x ) 
Let us observe that this inequality agrees well with Coniecture ll.il 

The last step before the proof of Theorem II .31 is Lemma 11.21 which we prove now. 
Proof: (Lemma 

The following notation will be convenient. For 1 < p < oo, let f2(n,p) be the set of n x n 
matrices whose rows are unit vectors in l p . 

We need a following well-known fact. Let 1 < p < p' < oo. Let a be a vector in R n . Then 
lla.Hr, i 1 

i < irir- ^ nv 7 ( 3 ) 
ll a llp' 

Equality on the left is possible only for a multiple of a standard basis vector, and equality on 
the right is possible only for a multiple of the all-1 vector. 

Let po be such that the matrix / is optimal for pq. Let p < pq. Let A E £l(n, p) with rows 
a\...a n . Let D = (da) be a diagonal matrix with da = » a )f p ■ Then DA is in J7(n,po) an d 

\\ a i Wpo 

therefore 

n it | 

per (A) = per (D' 1 ■ (DA)) = per (D^ 1 ) ■ per(DA) = ■ per(DA) 



i=l 



O'iWp 



n n || n || 



<[]^ -peril) =nTN - 1 
■ 1 W^H Wv ■ -i " * \\p 

1 = 1 " r 1=1 

By (j2J equality is only possible if all the rows ctj are standard basis vectors, and A is the identity 
matrix, up to permuting coordinates. 

This proves the first claim of the lemma. The proof of the second claim proceeds along 
similar lines, using second half of inequality ©• We omit the details. I 



Proof: fTheorem 11,311 

i — i 

log n 



Fix p = po = nlog " (" i)iog(w i) ^ ^ _ -Yj-p. The value of p is chosen precisely so that 



0(n)=n-(^y = l. 

By Theorem 12.21 Theorem 13.11 and Lemma 14. II 

n n 

U(n,p) < Y[w(k,p) < JJmax{l,0(lfe)} < (max {1, 9(n)}) n = 1 

k=l k=l 
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Therefore I is optimal for p = po- Lemma 11.21 completes the proof of the first claim of the 
theorem. 



Now, to the second claim. Fix p 6 (1, 2). Let q = 1/p. By Lemma 14. II there is an integer fco 
such that 9(k) < 1 for k < ko and 9(k) > 1 for k > k$. Since 9(k) = i this means that 

per (J ko ) = UkU 9( k ) = min fc >iper (J k ). 

Therefore, 



n n n ( I J \ 

U(n, Po ) < Hw(k,po) < Y[max{l,9(k)}= Y[max\l, per y\ 

fc=l fe=l fc=2 ^ PeT ^ 



per(J n ) _ per(J n 



per (J ko ) min fc >iper (J k ) 
It remains to estimate the denominator on the right. 
We have 

k\ }p--i) k x 0--i) x 
min per (J k )= min — r > min ; — > min 

k>i v ; fc>i ki k ~ k>i e k ~ x>i e x 

where in the last inequality an integer variable k is replaced with a real variable x. A simple 
analysis gives that the minumum on the right hand side is attained for x = exp{q/(l — q)} = 
exp{l/(p — 1)} and equals exp { — (p — l)/p ■ e 1 /^ -1 ^}. 

Therefore 

U(n,p) < exp {(p - l)/p ■ e 1 /^- 1 )} .per (J n ) = exp {(p - l)/p ■ e 1 /^} ■ 
This completes the proof of the second claim and of the theorem. | 



4 Appendix: A Proof of Theorem 13.11 

We start with a useful property of the function 9. Let 1/2 < q < 1 be a real number. 

Lemma 4.1: Let k > 1 and consider the continuous function 8(x) = x ■ ( ^ — ) of a real 

variable x on the interval [1, k] . If xq is a point of maximum of 9 then xq = 1 or xq = k. 

Proof: It is convenient to deal with f(x) = \n(9(x)) = lnx — q ■ (x In x — {x — 1) ln(x — 1)). 
The derivative fix) = £ - gin = % ■ ( I - x\n -?-) . 

J \ > x ^ x— 1 x \g x—1 J 

Consider the function g(x) = xln^j on [l,oo). The derivative g'(x) = In — = 

In ^1 + jrj^ — is strictly negative. At the endpoints, g(l) = oo and g(oo) = 1. Therefore 

on [1, oo) the function g decreases from oo to 1. Since | = p > 1 this means that there exists a 
positive real number x q > 1 depending only on q such that /' < for 1 < x < x q , f (x q ) = 0, 
and f'(x) > for x > x q . 
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Consequently, / is unimodal on [1, oo) with minimum in x„. The claim of the lemma follows. 

I 

The proof of the theorem proceeds by induction on k. For k = 1 the claim holds trivially. 
For k = 2 we have 

P(y)=P(y 1 ,l-y 1 )=y 2q + (l-y 1 ) 2q 

For q > 1/2, the function f(x) = x 2q + (1 — x) 2q attains its maximum on [0, 1] at and at 1. 
This means that the points of maximum of P are standard basis vectors, and the claim holds. 

Assume the theorem is true for 2 < I < k. 

Let y* G A be a point at which P attains maximum. If y* has 1 < I < k non-zero 
coordinates, then the induction hypothesis implies y* is the all- 1 / Z vector. This is to say 
P (y*) = 0(1). However, Lemma 14. II showed 9(1) < max {1, 9(k)}, reaching a contradiction. 

Therefore either y* is a standard basis vector, in which case we are done, or y* is an interior 
point of A. This is the remaining case. We will assume that y* is not the all-l/fc vector and 
reach a contradiction. 

Since y* is an interior extremum point, we can use the first and the second order optimality 
conditions on the gradient and the Hessian of P at y* to obtain information about y*. 

Let Si(y) = y q (1 - yj) q , for i = l...k. Of course P = Ya=i s i- 
Lemma 4.2: For all i = l...k 

si (y*) = y*P(y*) 

Proof: We have = 2£i an d, for j + i, ^ = Therefore 

The first order optimality conditions for y* say that there is a constant A such that for all j = 
l...k holds (y*) = A. This means that for j = l...k holds Sj (y*) - y*P (y*) = (l - yfj . 

Summing over j we obtain 
implying A = 0. That is, for all j = l...k holds Sj (y*) = y*P (y*). | 

Corollary 4.3: The coordinates of y* have two distinct values a and b with a < 1 — q < b. 



Proof: Let i ^ j be two distinct indices. By the lemma at y* we have Si = y*P and Sj = VjP- 
This implies 

y _i s A (^(i-^y 
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This means {y*) 1 ~ q (1 - y*) q = (yf) 1 " (l - yf^ . Let f(x) = x 1_ 9(l - x) q . We have shown 

that / (y*) = / {yf^j ■ Since the argument does not depend on the choice of i and j, this implies 
/ has the same value on all y*, i = l...k. 

The function / is a concave function on [0, 1] vanishing at the endpoints, with maximum 
at 1 — q. Therefore / takes each value at most twice, at two points lying on different sides of 
1 — q. Bearing in mind that y* is not a constant vector, the claim of the corollary follows. I 

Next, we compute the Hessian of P. We have, for i ^ j ^ t 

d 2 Si _ q(l-q)si d 2 Si q(l - q)si 



d 2 Si q 2 Si d 2 si q 2 Si 



Oyidyj ViiX-Vj) Oyjdyt (1 - (1 - y t ) 
Let H = H(y) be the Hessian of P at y. Then 



d 2 P A d 2 Sj , x / a* P-s 



J 



dy 2 ^ dy 2 \y 2 (1 - Vj ) 2 J 

Similarly 

Hi i ■ Hi 1 1 < ^ Hi i ■ Hi i , f)i i ■ f)i i , Hi i ■ Hi i , 



dyjdyt j~{ dyjdyt dyjdyt dyjdyt dyjdyt 

2 ( Sj S t \ 2 P-Sj-St 



j/t) yt{^-yj)J 0--Vj)0--yt) 

At y* we have si = y*P for all i = l...k. Therefore for H = H (y*) we have 

H(j,j) = -q(l-q) 



y*(i-y* 

and 



H(j,t)=q 2 P 



(l-y*)(l-y*) (1-y*) (1 - y*) i (1 - y*)(l - 



Lemma 4.4: y* /ias on/y one coordinate with value b. (And therefore k — 1 coordinates with 
value a.) 

Proof: We can write the Hessian at y* as H = —qP ■ (A + D), where A is a rank-1 matrix 
with an = t -A -y, and D is a diagonal matrix with da = 1 . 9 y \ s . 

,J (!-%*)(!-?/;)' 6 i/r(i-i/r) 
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The second order optimality conditions for y* say that H is negative semidefinite on the 
subspace V of the vectors in M. k orthogonal to the all-1 vector. This means that the matrix 
B = A + D is positive semidefinite on V. 

Assume for the moment that y* has two 6-valued coordinates. Let these be the first two 



coordinates. This means that 



g 

6(1-6)^ 



an 


ai2 




_ a 2 i 


a-22 _ 





1/(1 -bf 1/(1 -bf 
1/(1 -bf 1/(1 -bf 



, and 



dii d 12 

(fal d-22 



1-q-b 
6(1-6)^ 

±3E 



. Note, that since b > 1 — q, the diagonal values of the second matrix are 

6(1-6)^ . 

negative. 

Now, let v e V, v = (1, -1,0,..., 0). Then clearly 



vBv* 



'b{l-bf 



<0, 



contradicting positive semidefinitness of 5. This means that y* has only one coordinate valued 
b. I 

Consider the set Ai C A of stochastic vectors y with y 2 = ... = yu = ^^f- The preceding 
lemma implies that there is a maximum point y* of P in Ai. Moreover b = y* > 1/k. 

P, restricted to Ai, is a function of one variable x = y\ and is given by 
P(x) = x«(l-^4) (l-x)«(l ' 



fe-1 



fe — 1 



k - 1 



(fc _ 1 1 )(fc _ 1)g • (* q (k ~ 2 + x)^ + (k — 1)(1 - x) 2 ^(fc - 2 + x)^) 

We will show that on the interval [1/fe, 1] this function attains its maximum either at 1/k or at 
1. This means, recalling y^ > 1/k, that y* is a standard basis vector. This is a contradiction 
to previous assumptions, and will complete the proof of the theorem. 

Lemma 4.5: Let k > 3 be an integer, let 1/2 < q < 1 be a real number, and let f be a function 
on [1/k, 1] given by 

f{x) = x q (k - 2 + x) ik ~ 1)q + (k - 1)(1 - x) 2q {k - 2 + x) {k ~ 2)q 

Then f attains its maximum either at 1/k or at 1. 

Proof: We compute the derivative of /. 

fix) =qx q - 1 (k-2 + x)( k -^ q + {k-l)qx q {k-2 + x) {k ~ l ^ q - 1 - 

2(k - l)q(l - xf^ik - 2 + x) ik - 2)q + (k - l)(k - 2)q(l - x) 2q {k -2 + x)^" 2 )'- 1 = 
q{k-2 + kx){k-2 + x) i - k ~ 2)q - 1 x q - 1 ■ {{k-2 + x) q - (k — l)x 1_9 (l - x) 29_1 ) 

This means that the sign of / is determined by the sign of (k— 2+x) q — (k—l)x 1 ~ q (l—x) 2q ~ 1 . 
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Since t t q is monotone increasing, we can, as well, check the sign of 

h(x) = (k - 2 + x) - {k-l) 1 / q x 1 l q - l {l-x) 2 - 1 l q 

The function h{x) is strictly convex on [1/k, 1], with h(l/k) =0 and h(l) = k — 1 > 0. 
Therefore, there are two possible options. 

• h > on (1/k, 1]. This means that / attains its maximum at 1. 

• There is a point x £ (1/k, 1) such that h < on (1/k, x) and h > on (x, 1). This means 
that / attains its maximum at one of the endpoints 1/k or 1, and we are done. 

I 

This completes the proof of Theorem 13.11 
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